Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
نویسندگان
چکیده
منابع مشابه
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation
Large-scale image annotation is a challenging task in image content analysis, which aims to annotate each image of a very large dataset with multiple class labels. In this paper, we focus on two main issues in large-scale image annotation: 1) how to learn stronger features for multifarious images; 2) how to annotate an image with an automatically-determined number of class labels. To address th...
متن کاملOnline Multi-Label Active Learning for Large-Scale Multimedia Annotation
Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video ann...
متن کاملDeep Multi-Modal Image Correspondence Learning
Inference of correspondences between images from different modalities is an extremely important perceptual ability that enables humans to understand and recognize crossmodal concepts. In this paper, we consider an instance of this problem that involves matching photographs of building interiors with their corresponding floorplan. This is a particularly challenging problem because a floorplan, a...
متن کاملEfficient Large-Scale Multi-Modal Classification
While the incipient internet was largely text-based, the modern digital world is becoming increasingly multi-modal. Here, we examine multi-modal classification where one modality is discrete, e.g. text, and the other is continuous, e.g. visual representations transferred from a convolutional neural network. In particular, we focus on scenarios where we have to be able to classify large quantiti...
متن کاملMulti-Modal Image Annotation with Multi-Label Multi-Instance LDA
This paper studies the problem of image annotation in a multi-modal setting where both visual and textual information are available. We propose Multimodal Multi-instance Multi-label Latent Dirichlet Allocation (M3LDA), where the model consists of a visual-label part, a textual-label part and a labeltopic part. The basic idea is that the topic decided by the visual information and the topic deci...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Image Processing
سال: 2019
ISSN: 1057-7149,1941-0042
DOI: 10.1109/tip.2018.2881928